Data Joins + Transformations

Monday, April 24

Today we will…

Lambda Functions

We know how to use across() to apply built-in functions across many columns.

cereal |> 
  summarize(across(calories:potass, mean))
  calories  protein      fat   sodium    fiber   carbo   sugars   potass
1 106.8831 2.545455 1.012987 159.6753 2.151948 14.5974 6.922078 96.07792

But what if we want to pass additional arguments to a function?

  • E.g., we need to drop missing data (na.rm = T).
  • E.g., we need to calculate a trimmed mean (trim = 0.1).

Lambda Functions

When you want to specify additional arguments, you need to use a lambda function rather than just the function name:

~ <FUN_NAME>(.x, <ARGS>)

Calculate trimmed means, trimming 10% of the data on each end.

cereal |> 
  summarize(across(calories:potass, ~ mean(.x, trim = 0.1)))
  calories protein       fat  sodium    fiber    carbo   sugars   potass
1 107.3016 2.47619 0.8888889 163.254 1.773016 14.71429 6.920635 86.19048

Data Layouts

Tidy Data

Tidy data…

  • is rectangular.
  • has variables as rows and observations as columns.
  • has different formats for different tasks.

R4DS

Consequences of Messy Data

Illustration by Allison Horst

  • Tidy: use the same tools in similar ways for different datasets.
  • Messy: create unique tools that are difficult to generalize.

Creating Tidy Data

We may need to transform our data to turn it into the version of tidy that is best for a task at hand.

Illustration by Allison Horst

Creating Tidy Data

We want to look at mean cereal nutrients based on shelf.

  • The data are in a wide format – a separate column for each nutrient.
  • Transforming the data will make plotting easier.
library(liver)
data(cereal)
head(cereal)
name manuf type calories protein fat sodium fiber carbo sugars potass vitamins shelf weight cups rating
100% Bran N cold 70 4 1 130 10.0 5.0 6 280 25 3 1 0.33 68.40297
100% Natural Bran Q cold 120 3 5 15 2.0 8.0 8 135 0 3 1 1.00 33.98368
All-Bran K cold 70 4 1 260 9.0 7.0 5 320 25 3 1 0.33 59.42551
All-Bran with Extra Fiber K cold 50 4 0 140 14.0 8.0 0 330 25 3 1 0.50 93.70491
Almond Delight R cold 110 2 2 200 1.0 14.0 8 -1 25 3 1 0.75 34.38484
Apple Cinnamon Cheerios G cold 110 2 2 180 1.5 10.5 10 70 25 1 1 0.75 29.50954

Creating Tidy Data

Code
cereal_wide <- cereal |> 
  group_by(shelf) |> 
  summarise(across(calories:vitamins, mean))
shelf calories protein fat sodium fiber carbo sugars potass vitamins
1 102.5000 2.650000 0.60 176.2500 1.6850000 15.80000 4.800000 75.50000 20.00000
2 109.5238 1.904762 1.00 145.7143 0.9047619 13.61905 9.619048 57.80952 23.80952
3 107.7778 2.861111 1.25 158.6111 3.1388889 14.50000 6.527778 129.83333 35.41667
Code
my_colors <- c("calories_col" = "steelblue", "sugars_col" = "orange3")

cereal_wide |> 
  ggplot() +
  geom_point(aes(x = shelf, y = calories, color = "calories_col")) +
  geom_line(aes(x = shelf, y = calories, color = "calories_col")) + 
  geom_point(aes(x = shelf, y = sugars, color = "sugars_col")) +
  geom_line(aes(x = shelf, y = sugars, color = "sugars_col")) +
  scale_color_manual(values = my_colors, labels = names(my_colors)) +
  labs(x = "Shelf", y = "", subtitle = "Mean Amount", color = "Nutrient")

Code
cereal_long<- cereal |> 
  pivot_longer(cols = calories:vitamins,
               names_to = "Nutrient",
               values_to = "Amount") |> 
  group_by(shelf, Nutrient) |> 
  summarise(mean_amount = mean(Amount))
shelf Nutrient mean_amount
1 calories 102.5000000
1 carbo 15.8000000
1 fat 0.6000000
1 fiber 1.6850000
1 potass 75.5000000
1 protein 2.6500000
1 sodium 176.2500000
1 sugars 4.8000000
1 vitamins 20.0000000
2 calories 109.5238095
2 carbo 13.6190476
2 fat 1.0000000
2 fiber 0.9047619
2 potass 57.8095238
2 protein 1.9047619
2 sodium 145.7142857
2 sugars 9.6190476
2 vitamins 23.8095238
3 calories 107.7777778
3 carbo 14.5000000
3 fat 1.2500000
3 fiber 3.1388889
3 potass 129.8333333
3 protein 2.8611111
3 sodium 158.6111111
3 sugars 6.5277778
3 vitamins 35.4166667
Code
cereal_long |> 
  ggplot(aes(x = shelf, 
             y = mean_amount, 
             color = Nutrient)) +
  geom_point() +
  geom_line() +
  labs(x = "Shelf", y = "", subtitle = "Mean Amount")

Pivoting Data

Tidyexpalin animation by Kelsey Gonzalez

Manual Method

Consider daily rainfall observed in SLO in January 2023.

  • The data is in a human-friendly form (like a calendar).
  • Each week has a row, and each day has a column.

Data source

How would you manually convert this to long format?

Manual Method: Steps

  1. Create a new column: Day_of_Week.
  2. Create a new column: Rainfall (hold daily rainfall values).
  3. Now we have three columns (Week, Day_of_Week, and Rainfall) – start moving Sunday values over.
  4. Duplicate Week 1-5 and copy Monday values over.
  5. Duplicate Week 1-5 and copy Tuesday values over.
  6. Continue for the rest of the days of the week.
  7. You may want to arrange() by Week to get the rainfall values chronological order.

Computational Approach

We can use pivot_longer() to turn a wide dataset into a long(er) dataset.

pivot_longer()

Take a wide dataset and turn it into a long daaset.

  • cols – specify the columns that should be pivoted.
    • Do not include the names of ID columns (columns to not be pivoted).
  • names_to – the name of the new column containing the old column names.
  • values_to – the name of the new column containing the old column values.

pivot_longer()

library(readxl)
slo_rainfall <- read_xlsx("data/2023-rainfall-slo.xlsx")

slo_rainfall |> 
  mutate(across(Sunday:Saturday, as.numeric)) |> 
  pivot_longer(cols      = Sunday:Saturday,
               names_to  = "Day_of_Week",
               values_to = "Daily_Rainfall")
Week Day_of_Week Daily_Rainfall
1 Sunday 0.00
1 Monday 0.12
1 Tuesday 0.00
1 Wednesday 1.58
1 Thursday 0.91
1 Friday 0.00
1 Saturday 0.05
2 Sunday 0.27
2 Monday 4.26
2 Tuesday 0.43
2 Wednesday 0.00
2 Thursday 0.00
2 Friday 0.16
2 Saturday 1.41
3 Sunday 0.34
3 Monday 0.33
3 Tuesday 0.00
3 Wednesday 0.00
3 Thursday 0.13
3 Friday 0.00
3 Saturday 0.00
4 Sunday 0.00
4 Monday 0.00
4 Tuesday 0.00
4 Wednesday 0.00
4 Thursday 0.00
4 Friday 0.00
4 Saturday NA
5 Sunday NA
5 Monday NA
5 Tuesday NA
5 Wednesday NA
5 Thursday NA
5 Friday NA
5 Saturday NA

pivot_wider()

Take a long dataset and turn it into a wide daaset.

  • id_cols – specify the column(s) that contain the ID for unique rows in the wide dataset.
  • names_from – the name of the column containing the new column names.
  • values_from – the name of the column containing the new column values.

pivot_wider()

We calculate the mean amount of protein for cereals on each shelpf and for each manufacturer.

mean_protein <- cereal |> 
  group_by(manuf, shelf) |> 
  summarize(mean_protein = mean(protein))
manuf shelf mean_protein
A 2 4.000000
G 1 3.000000
G 2 1.285714
G 3 2.666667
K 1 2.750000
K 2 2.142857
K 3 2.916667
N 1 2.666667
N 2 2.500000
N 3 4.000000
P 1 1.500000
P 2 1.000000
P 3 3.000000
Q 1 5.000000
Q 2 2.000000
Q 3 2.500000
R 1 2.000000
R 3 3.000000

pivot_wider()

mean_protein |> 
  arrange(shelf) |> 
  pivot_wider(id_cols = manuf,
              names_from = shelf,
              values_from = mean_protein)
manuf 1 2 3
G 3.000000 1.285714 2.666667
K 2.750000 2.142857 2.916667
N 2.666667 2.500000 4.000000
P 1.500000 1.000000 3.000000
Q 5.000000 2.000000 2.500000
R 2.000000 NA 3.000000
A NA 4.000000 NA

Better names in pivot_wider()

mean_protein |> 
  arrange(shelf) |> 
  pivot_wider(id_cols = manuf,
              names_from = shelf,
              values_from = mean_protein,
              names_prefix = "Shelf_")
manuf Shelf_1 Shelf_2 Shelf_3
G 3.000000 1.285714 2.666667
K 2.750000 2.142857 2.916667
N 2.666667 2.500000 4.000000
P 1.500000 1.000000 3.000000
Q 5.000000 2.000000 2.500000
R 2.000000 NA 3.000000
A NA 4.000000 NA

Data Joins

Relational Data

Multiple, interconnected tables of data are called relational.

  • It is the relation between data sets, not just the individual data sets themselves, that are important.

IMDb movie relational data

Data Joins

We can combine (join) data tables based on their relations.

Mutating joins

Add variables from a new dataframe to observations in an existing dataframe.

full_join(), left_join(), right_join(), inner_join(), outer_join()

Filtering Joins

Filter observations based on values in new dataframe.

semi_join(), anti_join()

Keys

A key uniquely identifies an observation in a data set.

  • To combine (join) two datasets, the key needs to be present in both.

inner_join()

Keeps obsertvations when their keys are present in both datasets.

inner_join(): IMDb Example

directors_genres
director_id genre prob
429 Adventure 0.750000
429 Fantasy 0.750000
2931 Drama 0.714286
2931 Action 0.428571
11652 Sci-Fi 0.500000
11652 Action 0.500000
14927 Animation 1.000000
14927 Family 1.000000
15092 Comedy 0.545455
15092 Crime 0.545455
movies_directors
director_id movie_id
429 300229
9247 124110
11652 10920
11652 333856
14927 192017
15092 109093
15092 237431

ID: 429, 2931, 11652, 14927, 15092       ID: 429, 9247, 11652, 14927, 15092

inner_join(directors_genres, movies_directors)
director_id genre prob movie_id
429 Adventure 0.750000 300229
429 Fantasy 0.750000 300229
11652 Sci-Fi 0.500000 10920
11652 Sci-Fi 0.500000 333856
11652 Action 0.500000 10920
11652 Action 0.500000 333856
14927 Animation 1.000000 192017
14927 Family 1.000000 192017
15092 Comedy 0.545455 109093
15092 Comedy 0.545455 237431
15092 Crime 0.545455 109093
15092 Crime 0.545455 237431

ID: 429, 2931, 9247, 11652, 14927, 15092

inner_join(): IMDb Example

What if our key does not have the same name?

directors_genres
director_id genre prob
429 Adventure 0.750000
429 Fantasy 0.750000
2931 Drama 0.714286
2931 Action 0.428571
11652 Sci-Fi 0.500000
11652 Action 0.500000
14927 Animation 1.000000
14927 Family 1.000000
15092 Comedy 0.545455
15092 Crime 0.545455
directors
id first_name last_name
429 Andrew Adamson
9247 Zach Braff
11652 James (I) Cameron
14927 Ron Clements
15092 Ethan Coen
inner_join(directors_genres, 
           directors, 
           join_by(director_id == id))
id first_name last_name genre prob
429 Andrew Adamson Adventure 0.750000
429 Andrew Adamson Fantasy 0.750000
11652 James (I) Cameron Sci-Fi 0.500000
11652 James (I) Cameron Action 0.500000
14927 Ron Clements Animation 1.000000
14927 Ron Clements Family 1.000000
15092 Ethan Coen Comedy 0.545455
15092 Ethan Coen Crime 0.545455

Join by different variables on dataX and dataY: join_by(a == b) will match dataX$a to dataY$b.

Piping Joins

Remember: the dataset you pipe in becomes the first argument of the function you are piping into!

  • So if you are using a pipe, you will only be specifying the right dataset inside the join function.
inner_join(directors_genres, movies_directors)

…is equivalent to…

directors_genres |> 
  inner_join(movies_directors)

Mutating Joins

  • left_join() – keep only (and all) observations in the left data set

  • right_join() – keep only (and all) observations in the right data set

  • full_join() – keep all observations in both data sets

Mutating Joins

Which directors would remain for each of the following?

  • left_join(directors_genres, movies_directors)
  • right_join(directors_genres, movies_directors)
  • full_join(directors_genres, movies_directors)

directors_genres |> 
  distinct(director_id)
director_id
429
2931
11652
14927
15092
movies_directors |> 
  distinct(director_id)
director_id
429
9247
11652
14927
15092

Filtering Joins: semi_join()

Keeps observations when their keys are present in both datasets, but only keeps variables from the first dataset.


→  

Filtering Joins: semi_join()

directors_genres |> 
  semi_join(movies_directors)
director_id genre prob
429 Adventure 0.750000
429 Fantasy 0.750000
11652 Sci-Fi 0.500000
11652 Action 0.500000
14927 Animation 1.000000
14927 Family 1.000000
15092 Comedy 0.545455
15092 Crime 0.545455

Movie Directors: 429, 2931, 11652, 14927, 15092

directors_genres |>
  filter(director_id %in% movies_directors$director_id)
director_id genre prob
429 Adventure 0.750000
429 Fantasy 0.750000
11652 Sci-Fi 0.500000
11652 Action 0.500000
14927 Animation 1.000000
14927 Family 1.000000
15092 Comedy 0.545455
15092 Crime 0.545455

Filtering Joins: anti_join()

Removes observations when their keys are present in both datasets, and only keeps variables from the first dataset.


→  


Filtering Joins: anti_join()

directors_genres |> 
  anti_join(movies_directors)
director_id genre prob
2931 Drama 0.714286
2931 Action 0.428571

Movie Directors: 429, 2931, 11652, 14927, 15092

directors_genres |>
  filter(!director_id %in% movies_directors$director_id)
director_id genre prob
2931 Drama 0.714286
2931 Action 0.428571

PA 4: Military Spending

Today you will be tidying messy data to explore the relationship between countries of the world and military spending.

  • Due Wednesday, 4/26 at 10:00am

Bonus Challenge: Murder in SQL City

For this challenge, you will be using table joins to solve a murder mystery!

  • Completing this Challenge will require the manipulation of strings – we will focus on this next week.
  • Due Monday, 5/8 at 11:59pm

To do…

  • PA 4: Military Spending
    • Due Wednesday, 4/26 at 10:00am
  • Bonus Challenge: Murder in SQL City
    • Due Monday, 5/8 at 11:59pm

Wednesday, April 26

Today we will…

  • Review Lab 3: Familiarity with AAE
  • Miscellaneous…
    • Clean Variable Names
    • Lifecycle Stages
  • Extensions to Relational Data
  • Lab 4: Avocados
  • Challenge 3: Avocado Toast Ate My Mortgage

Lab 3: Familiarity with AAE

Lab 3: Familiarity with AAE – Sketch it out!

For each demographic group listed below, determine all words in this study that were the most and least familiar, on average.

Clean Variable Names with janitor

Data from external sources likely has variable names not ideally formatted for R.

Names may…

  • contain spaces.
  • start with numbers.
  • start with a mix of capital and lower case letters.
names(military)[1:12]
 [1] "Country"        "Notes"          "Reporting year" "1988"          
 [5] "1989"           "1990"           "1991"           "1992"          
 [9] "1993"           "1994"           "1995"           "1996"          

Clean Variable Names with janitor

The janitor package converts all variable names in a dataset to snake_case.

Names will…

  • start with a lower case letter.
  • have spaces filled in with _.
library(janitor)
military_clean_names <- military |> 
  clean_names()

names(military_clean_names)[1:12]
 [1] "country"        "notes"          "reporting_year" "x1988"         
 [5] "x1989"          "x1990"          "x1991"          "x1992"         
 [9] "x1993"          "x1994"          "x1995"          "x1996"         

Lifceycle Stages

As packages get updated, the functions and function arguments included in those packages will change.

  • The accepted syntax for a function may change.
  • A function/functionality may disappear.

Learn more about lifecycle stages of packages, functions, function arguments in R.

Deprecated Functions

A deprecated functionality has a better alternative available and is scheduled for removal.

  • You get a warning telling you what to use instead.
military_clean |> 
  filter(across(Notes:`2019`, is.na)) |> 
  slice_head(n = 3) |> 
  select(1:8)
Warning: Using `across()` in `filter()` was deprecated in dplyr 1.0.8.
ℹ Please use `if_any()` or `if_all()` instead.
# A tibble: 3 × 8
  Country      Notes `Reporting year` `1988` `1989` `1990` `1991` `1992`
  <chr>        <chr> <chr>            <chr>  <chr>  <chr>  <chr>  <chr> 
1 Africa       <NA>  <NA>             <NA>   <NA>   <NA>   <NA>   <NA>  
2 North Africa <NA>  <NA>             <NA>   <NA>   <NA>   <NA>   <NA>  
3 Sub-Saharan  <NA>  <NA>             <NA>   <NA>   <NA>   <NA>   <NA>  

Deprecated Functions

You should not use deprecated functions!

Instead, we use…

military_clean |>
  filter(if_all(Notes:`2019`, ~ is.na(.x))) |> 
  slice_head(n = 3) |> 
  select(1:8)
# A tibble: 3 × 8
  Country      Notes `Reporting year` `1988` `1989` `1990` `1991` `1992`
  <chr>        <chr> <chr>            <chr>  <chr>  <chr>  <chr>  <chr> 
1 Africa       <NA>  <NA>             <NA>   <NA>   <NA>   <NA>   <NA>  
2 North Africa <NA>  <NA>             <NA>   <NA>   <NA>   <NA>   <NA>  
3 Sub-Saharan  <NA>  <NA>             <NA>   <NA>   <NA>   <NA>   <NA>  

Superceded Functions

A superseded functionality has a better alternative, but is not going away .

  • This is a softer alternative to deprecation.
  • A superseded function will not give a warning (since there’s no risk if you keep using it), but the documentation will give you a recommendation.

Extensions to Relational Data

Relational Data

When we work with multiple tables of data, we say we are working with relational data.

  • It is the relations, not just the individual datasets, that are important.

When we work with relational data, we rely on keys.

  • A key uniquely identifies an observation in a dataset.
  • A key allows us to relate datasets to each other

IMDb Movies Data

How can we find each director’s active years?

Joining Multiple Data Sets

movies_directors[1:5,]
# A tibble: 5 × 2
  director_id movie_id
        <dbl>    <dbl>
1         429   300229
2        2931   254943
3        9247   124110
4       11652    10920
5       11652   333856
directors[1:5,]
# A tibble: 5 × 3
     id first_name last_name
  <dbl> <chr>      <chr>    
1   429 Andrew     Adamson  
2  2931 Darren     Aronofsky
3  9247 Zach       Braff    
4 11652 James (I)  Cameron  
5 14927 Ron        Clements 
movies[1:5,]
# A tibble: 5 × 4
     id name           year  rank
  <dbl> <chr>         <dbl> <dbl>
1 10920 Aliens         1986  8.20
2 17173 Animal House   1978  7.5 
3 18979 Apollo 13      1995  7.5 
4 30959 Batman Begins  2005 NA   
5 46169 Braveheart     1995  8.30

movies_directors |> 
  inner_join(directors, 
             join_by(director_id == id))
# A tibble: 41 × 4
   director_id movie_id first_name last_name
         <dbl>    <dbl> <chr>      <chr>    
 1         429   300229 Andrew     Adamson  
 2        2931   254943 Darren     Aronofsky
 3        9247   124110 Zach       Braff    
 4       11652    10920 James (I)  Cameron  
 5       11652   333856 James (I)  Cameron  
 6       14927   192017 Ron        Clements 
 7       15092   109093 Ethan      Coen     
 8       15092   237431 Ethan      Coen     
 9       15093   109093 Joel       Coen     
10       15093   237431 Joel       Coen     
# ℹ 31 more rows
movies_directors |> 
  inner_join(directors, 
             join_by(director_id == id)) |> 
  inner_join(movies,
             join_by(movie_id == id)) |> 
  rename(movie_name = name)
# A tibble: 41 × 7
   director_id movie_id first_name last_name movie_name               year  rank
         <dbl>    <dbl> <chr>      <chr>     <chr>                   <dbl> <dbl>
 1         429   300229 Andrew     Adamson   Shrek                    2001  8.10
 2        2931   254943 Darren     Aronofsky Pi                       1998  7.5 
 3        9247   124110 Zach       Braff     Garden State             2004  8.30
 4       11652    10920 James (I)  Cameron   Aliens                   1986  8.20
 5       11652   333856 James (I)  Cameron   Titanic                  1997  6.90
 6       14927   192017 Ron        Clements  Little Mermaid, The      1989  7.30
 7       15092   109093 Ethan      Coen      Fargo                    1996  8.20
 8       15092   237431 Ethan      Coen      O Brother, Where Art T…  2000  7.80
 9       15093   109093 Joel       Coen      Fargo                    1996  8.20
10       15093   237431 Joel       Coen      O Brother, Where Art T…  2000  7.80
# ℹ 31 more rows

Joining on Multiple Variables

Using the AAE data from Lab 3…

music
# A tibble: 10,752 × 6
   word          subj   folk  rock country   pop
   <chr>         <chr> <dbl> <dbl>   <dbl> <dbl>
 1 [to be] ghost p15       0     1       3     4
 2 [to be] ghost p53       0     0       1     3
 3 [to be] ghost p136      0     0       0     0
 4 [to be] ghost p36       1     1       2     2
 5 [to be] ghost p118      0     1       0     3
 6 [to be] ghost p106      0     0       1     1
 7 [to be] ghost p70       2     0       0     0
 8 [to be] ghost p107      0     1       3     1
 9 [to be] ghost p102      0     0       0     0
10 [to be] ghost p83       0     4       4     4
# ℹ 10,742 more rows
familiarity
# A tibble: 10,752 × 3
   word          participant familiarity
   <chr>         <chr>             <dbl>
 1 [to be] ghost p15                   1
 2 [to be] ghost p53                   1
 3 [to be] ghost p136                  1
 4 [to be] ghost p36                   1
 5 [to be] ghost p118                  1
 6 [to be] ghost p106                  1
 7 [to be] ghost p70                   1
 8 [to be] ghost p107                  1
 9 [to be] ghost p102                  5
10 [to be] ghost p83                   1
# ℹ 10,742 more rows

{.r .cell-code code-line-numbers="3"-4""} music_wordfam <- music |> full_join(familiarity, join_by(word == word, subj == participant)) music_wordfam

# A tibble: 10,752 × 7
   word          subj   folk  rock country   pop familiarity
   <chr>         <chr> <dbl> <dbl>   <dbl> <dbl>       <dbl>
 1 [to be] ghost p15       0     1       3     4           1
 2 [to be] ghost p53       0     0       1     3           1
 3 [to be] ghost p136      0     0       0     0           1
 4 [to be] ghost p36       1     1       2     2           1
 5 [to be] ghost p118      0     1       0     3           1
 6 [to be] ghost p106      0     0       1     1           1
 7 [to be] ghost p70       2     0       0     0           1
 8 [to be] ghost p107      0     1       3     1           1
 9 [to be] ghost p102      0     0       0     0           5
10 [to be] ghost p83       0     4       4     4           1
# ℹ 10,742 more rows

Lab + Challenge

Lab 4: Avocados + Challenge 4: Avocado Toast Ate My Mortgage


Handy Helpers

rename() – Change names of columns

separate() – Separate values of a variable


Filtering Joins

semi_join(): Keeps values found in another data set

anti_join(): Keeps values not found in another data set

Workflow

  1. Load packages + read in original data
library(tidyverse)
data_original <- read_csv(file = "path/to/datal.csv")
  1. Clean data – save your changes! This is now your new “master” data set
data_clean <- data_original |> 
  mutate(across(x1:x5, ~ as.factor(.x))) |> 
  mutate(new_var <- if_else(...))
  1. If you need subsets, create those from your new “master” data for the specific tasks.
demographics_subj <- data_clean |> 
  distinct(subj, keep_all = TRUE)
  1. Output only the the information you want to include in your assignment.

To do…

  • Lab 4: Avocados
    • Due Friday, 4/28 at 11:59pm
  • Challenge 4: Avocado Toast Ate My Mortgage
    • Due Saturday, 4/29 at 11:59pm
  • Read Chapter 5: Special Data Types
    • Check-ins 5.1 + 5.2 + 5.3 due Monday (5/1) at 10:00am
  • Bonus Challenge: Murder Mystery in SQL City
    • Due Monday 5/8 at 11:59pm